Skip to content

Fix tokenizer loading for GPT2 #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 18, 2024
Merged

Fix tokenizer loading for GPT2 #757

merged 1 commit into from
Oct 18, 2024

Conversation

yusefes
Copy link
Contributor

@yusefes yusefes commented Oct 17, 2024

Fixes #752

Fix the issue with loading the tokenizer for 'gpt2'.

  • scrapegraphai/utils/tokenizer.py

    • Add a check for GPT2TokenizerFast in the num_tokens_calculus function.
    • Import GPT2TokenizerFast from transformers.
  • scrapegraphai/utils/tokenizers/tokenizer_ollama.py

    • Modify the num_tokens_ollama function to handle GPT2TokenizerFast.
  • tests/graphs/smart_scraper_ollama_test.py

    • Add a test case to verify the tokenizer loading for GPT2TokenizerFast.

For more details, open the Copilot Workspace session.

Fixes #752

Fix the issue with loading the tokenizer for 'gpt2'.

* **scrapegraphai/utils/tokenizer.py**
  - Add a check for `GPT2TokenizerFast` in the `num_tokens_calculus` function.
  - Import `GPT2TokenizerFast` from `transformers`.

* **scrapegraphai/utils/tokenizers/tokenizer_ollama.py**
  - Modify the `num_tokens_ollama` function to handle `GPT2TokenizerFast`.

* **tests/graphs/smart_scraper_ollama_test.py**
  - Add a test case to verify the tokenizer loading for `GPT2TokenizerFast`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/ScrapeGraphAI/Scrapegraph-ai/issues/752?shareId=XXXX-XXXX-XXXX-XXXX).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never seen before thank you

@VinciGit00 VinciGit00 merged commit bde1e0f into ScrapeGraphAI:main Oct 18, 2024
3 checks passed
Copy link

🎉 This PR is included in version 1.26.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

@VinciGit00
Copy link
Collaborator

Hi
I think there Is a problem here, I do not wanna install tensowrlow or PyTorch btw
Screenshot 2024-10-18 alle 17 35 58

Copy link

🎉 This PR is included in version 1.27.0-beta.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't load tokenizer for 'gpt2'
2 participants